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ABSTRACT 

In  this  paper,  che  authors  derived  the  large  sample  distribution  of  the  t 
statistic  based  upon  the  observations  on  the  first  principal  component  Instead 
of  the  original  variables.  It  is  shown  chat  the  above  statistic  is  distributed 
asymptotically  as  Student’s  t  distribution. 
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1.  INTRODUCTION 


Data  analysts  are  often  confronted  with  the  problem  of  large  dimensional 
data.  In  some  of  these  situations,  it  is  customary  to  reduce  the  dimension¬ 
ality  of  the  problem  by  using  principal  component  analysis  and  to-  perform  statis¬ 
tical  analysis  of  the  data  using  the  new  variables  (principal  components).  For 
example,  the  new  variables  are  used  in  the  area  of  classification.  Chestnut  and 
Floyd  (19:81)  used  the  principal  components  as  variables  in  identification  of  under¬ 
water  targets.  However,  the  statistical  data  analysis  using  the  principal  com¬ 
ponents  is  adhoc  since  the  distributions  of  the  test  statistics  based  upon  the 
principal  components  are  complicated  when  the  covariance  matrix  is  unknown. 

Very  little  work  was  done  in  the  literature  on  deriving  the  distributions 
of  these  test  statistics  £yen  in  the  asymptotic  case.  In  this  paper,  we  derive 
the  asymptotic  distribution  .of  the  t  statistic  based  upon  the  new  variable 
(the  most  important  principal  component)  instead  of  using  any  of  the  original 
variables.  The  ahove  asymptotic  distribution  is  shown  to  be  Student's  t  distri¬ 
bution.  The  accuracy  of  the  above  approximation  is  s  tudied  by  comparing  the 
simulated  values  using  the  asymptotic  expression  with  the  standard  Students 
t  table.  It  is  found  that  the  accuracy  of'  the  above  approximation  is  sufficient 
for  many  practical  situations. 
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2.  ASYMPTOTIC  DISTRIBUTION  OF  t-STATISTIC 
BASED  UPON  A  PRINCIPAL  COMPONENT 


Consider  a  random  matrix  X  -  (X^ »xn t * (n+1)  whose  columns  are 
distributed  Independently  as  multivariate  normal  with  a  common  covariance 
matrix  E  and  mean  vector  p.  Now, 


n+1 


E(S/n)  «Z 
n+1 


(2.1) 


.  Let  T:  p*p  be  an  orthogonal  matrix  such  thj 


where  S  -  J  (X.-'XJ  (X,-X)  »  X-^Xl/<n+l) 
i-l  -1  -  -  i-l”1 

T'Z  r  *A  ■  diagCX, ,. . .  ,X  )  and  X,  >..»  >X  .  Also,  let  G  be  an  orthogonal  matrix 
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such  that  ~ — ■  L  «  diagCA^*  •  •  •  »ip)  and  •  Now,  let 
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so  that 
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where  Z  - 


r*Yr 
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(ZAj).  So, 


AH  +  ZH  -  HL 


(2.4) 


where  H-r'G.  Now,  let  r-tY^j)  and  G-(g^).  It  is  known  (see  Mallows  (1961), 

Fang  and  Krishnaiah  (1981))  by  applying  perturbation  technique  that  for  A  ,>A  >A  . , , 
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Using  H-T’G,  we  obtain 
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Under  the  assumption  of  a  single  non-isotropic  principal  component,  the  eigen¬ 
value  X ^  is  simple.  Let  the  corresponding  eigenvector  be  denoted  by  T^.  Let 
fl  "  CfLl,,,'’8p1r  be  the  sample  eigenvector  corresponding  to  the  largest  eigen 
value  of  S/n,  and 


§1  "  £i  +  j^Cn”1^2!  +  IlG1*1!  +  0(n“3/2l 


(2.8) 


according  to  Eq.  (2.7).  Now  consider  the  statistic 
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We  know-  that 
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Since  /n  (X-y)  Is  of  order  0  (1),  and  the  Y  .  's  in  g'  (n  ),  r-1,2,...,  are 

~  ~  P  lj  ~-L 

also  of  order  0  (1),  the  order  of  probability  convergence  in  Eq.  (2.10),  (2.11) 
P 

is  valid  according  to  the  Chernoff-Pratt  definition  of  op  (Bishop,  Elenberg 
and  Holland  (1975)X 


The  statistic 


^ng'  (X  -  y)  /Sr'(X-y) 
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(2.12) 


So  the  statistic  T  converges  in  distribution  to  Student's  t  distribution  with 
n  degrees-of-freedom. 

Suppose,  we  wish  to  test  the  hypothesis  that  rjy»0.  Then,  we  use 

/n  g!  X 


as  a  test  statistic. 


3.  AN  EMPIRICAL  STUDY  ON  THE  ACCURACY 
OF  THE  APPROXIMATION 

In  this  section,  we  study  the  accuracy  of  the  asymptotic  expression  given 
in  the  preceding  section.  In  Table  1,  the  entries  in  the  rows  corresponding 
to  t  give  the  values  of  t^  where 

-  (1-a)  (3.1) 

and  t  is  distributed  as  Student's  t  distribution  with  n  degrees  of  freedom. 

A 

The  entries  in  the  rows  corresponding  to  a.  are  the  simulated  values  of  a  obtained 
by  using  the  IMSL  subroutines  GGNSM,  EIGRS  for  the  Monte  Carlo  methods.  In 
computing  the  simulated  values,  5000  trials  are  performed  and  each  trial  con¬ 
sisted  of  a  random  sample  of  size  n+1  from  a  multivariate  normal  population  with 
covariance  matrix  I  »  .  The  entries  in  the  table  are  computed 

for  different  values  of  n,  X^,  X?,  X^  and  p.  From  the  table,  we  observe 
that  the  approximation  is  satisfactory-  when  n  is  moderately  large  like  23. 

The  approximation  is  not  good  when  o  is  small  and  n  *  10.  But,  the  accuracy 
of  the  approximation  increased  as  a  increased  even  when  n  ■  IQ.  From  Tables 
2  and  3,  we  observe  that  the  approximation  is  good  when  n«  23  and  a  increases 
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